Skip to content

Conversation

@micmelesse
Copy link
Collaborator

@micmelesse micmelesse commented Jul 10, 2024

this is an imlementaiton of flash_attn_with_kvcache using a triton flash attention decode kernel. It adds the following to the decode kernel.

  • key masking on the kvcache
  • inplace kv cache updates
  • causal
  • alibi

future work will involve adding support for

  • paged attention
  • local sliding window attention
  • rotary embedding

@micmelesse micmelesse changed the title enable kvcache enable flash_attn_with_kvcache Jul 10, 2024
@micmelesse micmelesse requested review from scxiao and vgokhale July 10, 2024 19:47
@micmelesse micmelesse marked this pull request as ready for review August 2, 2024 00:08
@micmelesse micmelesse merged commit 01a1329 into main_perf Aug 6, 2024
@micmelesse micmelesse deleted the micmelesse/enable_kvcache branch August 6, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants